Combining Rule-based and Data-driven Techniques for Grammatical Relation Extraction in Spoken Language

نویسندگان

  • Kenji Sagae
  • Alon Lavie
چکیده

We investigate an aspect of the relationship between parsing and corpus-based methods in NLP that has received relatively little attention: coverage augmentation in rule-based parsers. In the specific task of determining grammatical relations (such as subjects and objects) in transcribed spoken language, we show that a combination of rule-based and corpus-based approaches, where a rule-based system is used as the teacher (or an automatic data annotator) to a corpus-based system, outperforms either system in isolation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parsing of Grammatical Relations for Databases of Spoken Language

Despite the significant advances in syntactic parsing of written text, the application of these techniques to spontaneous spoken language has received more limited attention. The explosive growth of available corpora of transcribed spoken language opens up new opportunities in that direction. High accuracy parsers for spoken language will in turn provide a platform for development of a wide ran...

متن کامل

Parsing of Grammatical Relations in Transcripts of Parent-Child Dialogs Thesis Summary

Automatic analysis of syntax is one of the core problems in natural language processing. Despite significant advances in syntactic parsing of written text, the application of these techniques to spontaneous spoken language has received more limited attention. The recent explosive growth of online, accessible corpora of spoken language interactions opens up new opportunities for the development ...

متن کامل

A Multi-Strategy Approach for Parsing of Grammatical Relations in Transcripts of Parent-Child Dialogs

Automatic analysis of syntax is one of the core problems in natural language processing. Despite significant advances in syntactic parsing of written text, the application of these techniques to spontaneous spoken language has received more limited attention. The recent explosive growth of online, accessible corpora of spoken language interactions opens up new opportunities for the development ...

متن کامل

Application of the rule extraction method to evaluate seismicity of Iran

Assessing seismic hazards involves specifying the likelihood, magnitude and location of earthquakes in a region. Predicting the seismic hazards is the first step in reducing the impact of the damage caused by an earthquake.  In this study, to fully utilize all the known parameters which may possibly affect the occurrence of earthquakes (mb ≥ 4.5); a data-driven rule-extraction method called the...

متن کامل

تبدیل خودکار درخت‌بانک وابستگی فارسی به درخت‌بانک سازه‌ای

There are two major types of treebanks: dependency-based and constituency-based. Both of them have applications in natural language processing and computational linguistics. Several dependency treebanks have been developed for Persian. However, there is no available big size constituency treebank for this language. In this paper, we aim to propose an algorithm for automatic conversion of a depe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003